Goto

Collaborating Authors

 Portland


Subtitling Your Life

The New Yorker

A little over thirty years ago, when he was in his mid-forties, my friend David Howorth lost all hearing in his left ear, a calamity known as single-sided deafness. "It happened literally overnight," he said. "My doctor told me, 'We really don't understand why.' " At the time, he was working as a litigator in the Portland, Oregon, office of a large law firm. His hearing loss had no impact on his job--"In a courtroom, you can get along fine with one ear"--but other parts of his life were upended. The brain pinpoints sound sources in part by analyzing minute differences between left-ear and right-ear arrival times, the same process that helps bats and owls find prey they can't see.


Data Valuation with Gradient Similarity

arXiv.org Machine Learning

High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains. Distinguishing low- from high-quality data can be challenging, often requiring expert knowledge and considerable manual intervention. Data Valuation algorithms are a class of methods that seek to quantify the value of each sample in a dataset based on its contribution or importance to a given predictive task. These data values have shown an impressive ability to identify mislabeled observations, and filtering low-value data can boost machine learning performance. In this work, we present a simple alternative to existing methods, termed Data Valuation with Gradient Similarity (DVGS). This approach can be easily applied to any gradient descent learning algorithm, scales well to large datasets, and performs comparably or better than baseline valuation methods for tasks such as corrupted label discovery and noise quantification. We evaluate the DVGS method on tabular, image and RNA expression datasets to show the effectiveness of the method across domains. Our approach has the ability to rapidly and accurately identify low-quality data, which can reduce the need for expert knowledge and manual intervention in data cleaning tasks.


Accelerating String-Key Learned Index Structures via Memoization-based Incremental Training

arXiv.org Artificial Intelligence

Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their models to incorporate the changes introduced by update queries. To efficiently retrain the models, existing learned index systems often harness a linear algebraic QR factorization technique that performs matrix decomposition. This factorization approach processes all key-position pairs during each retraining, resulting in compute operations that grow linearly with the total number of keys and their lengths. Consequently, the retrainings create a severe performance bottleneck, especially for variable-length string keys, while the retrainings are crucial for maintaining high prediction accuracy and in turn, ensuring low query service latency. To address this performance problem, we develop an algorithm-hardware co-designed string-key learned index system, dubbed SIA. In designing SIA, we leverage a unique algorithmic property of the matrix decomposition-based training method. Exploiting the property, we develop a memoization-based incremental training scheme, which only requires computation over updated keys, while decomposition results of non-updated keys from previous computations can be reused. We further enhance SIA to offload a portion of this training process to an FPGA accelerator to not only relieve CPU resources for serving index queries (i.e., inference), but also accelerate the training itself. Our evaluation shows that compared to ALEX, LIPP, and SIndex, a state-of-the-art learned index systems, SIA-accelerated learned indexes offer 2.6x and 3.4x higher throughput on the two real-world benchmark suites, YCSB and Twitter cache trace, respectively.


LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems

arXiv.org Artificial Intelligence

Crowdsourcing platforms have transformed distributed problem-solving, yet quality control remains a persistent challenge. Traditional quality control measures, such as prescreening workers and refining instructions, often focus solely on optimizing economic output. This paper explores just-in-time AI interventions to enhance both labeling quality and domain-specific knowledge among crowdworkers. We introduce LabelAId, an advanced inference model combining Programmatic Weak Supervision (PWS) with FT-Transformers to infer label correctness based on user behavior and domain knowledge. Our technical evaluation shows that our LabelAId pipeline consistently outperforms state-of-the-art ML baselines, improving mistake inference accuracy by 36.7% with 50 downstream samples. We then implemented LabelAId into Project Sidewalk, an open-source crowdsourcing platform for urban accessibility. A between-subjects study with 34 participants demonstrates that LabelAId significantly enhances label precision without compromising efficiency while also increasing labeler confidence. We discuss LabelAId's success factors, limitations, and its generalizability to other crowdsourced science domains.


Dynamic Algorithms for Matroid Submodular Maximization

arXiv.org Artificial Intelligence

Submodular maximization under matroid and cardinality constraints are classical problems with a wide range of applications in machine learning, auction theory, and combinatorial optimization. In this paper, we consider these problems in the dynamic setting, where (1) we have oracle access to a monotone submodular function $f: 2^{V} \rightarrow \mathbb{R}^+$ and (2) we are given a sequence $\mathcal{S}$ of insertions and deletions of elements of an underlying ground set $V$. We develop the first fully dynamic $(4+\epsilon)$-approximation algorithm for the submodular maximization problem under the matroid constraint using an expected worst-case $O(k\log(k)\log^3{(k/\epsilon)})$ query complexity where $0 < \epsilon \le 1$. This resolves an open problem of Chen and Peng (STOC'22) and Lattanzi et al. (NeurIPS'20). As a byproduct, for the submodular maximization under the cardinality constraint $k$, we propose a parameterized (by the cardinality constraint $k$) dynamic algorithm that maintains a $(2+\epsilon)$-approximate solution of the sequence $\mathcal{S}$ at any time $t$ using an expected worst-case query complexity $O(k\epsilon^{-1}\log^2(k))$. This is the first dynamic algorithm for the problem that has a query complexity independent of the size of ground set $V$.


Machine Learning Scientist at PEAK6 - Portland, OR / Chicago, IL

#artificialintelligence

Apex Fintech Solutions (AFS) powers innovation and the future of digital wealth management by processing millions of transactions daily, to simplify, automate, and facilitate access to financial markets for all. Our robust suite of fintech solutions enables us to support clients such as Stash, Betterment, SoFi, and WeBull, and more than 20 million of our clients' customers. Collectively, AFS creates an environment in which companies with the biggest ideas in fintech are empowered to change the world. We are based in Dallas, TX and also have offices in Austin, New York, Chicago, Los Angeles, Portland, and Belfast. If you are seeking a fast-paced and entrepreneurial environment where you'll have the opportunity to make an immediate impact, and you have the guts to change everything, this is the place for you.


AI In Insurance Market : Global Opportunity Analysis And Ind...

#artificialintelligence

PORTLAND, OR, USA, UNITED STATES, November 9, 2022 / / -- Increase in investment by companies in & machine learning and rise in preference for personalized insurance services boost the growth of the global market. Allied Market Research published a report, titled, 'AI in Insurance Market by Offering (Hardware, Software, Service), by Deployment Model (On-premise, Cloud), by Technology (Machine Learning, Natural Language Processing, Computer Vision, Others), by Enterprise Size (Large Enterprises, SMEs), by End-user (Life and Health Insurance, Property and Casualty Insurance), by Application (Fraud Detection and Credit Analysis, Customer Profiling and Segmentation, Product and Policy Design, Underwriting and Claims Assessment): Global Opportunity Analysis and Industry Forecast, 2021-2031'. According to the report, the global AI in insurance industry generated $2.74 billion in 2021, and is anticipated to generate $45.74 billion by 2031, witnessing a CAGR of 32.5% from 2022 to 2031. Increase in investment by insurance companies in AI & machine learning, surge in collaboration between insurance companies and AI & machine learning solution companies, and rise in preference for personalized insurance services boost the growth of the global AI in insurance market. However, high deployment cost of AI & advanced machine learning and lack of skilled labor hamper the market growth. On the contrary, increase in government initiatives and rise in investments to leverage the AI technology are expected to offer remunerative opportunities for expansion of the market during the forecast period.


Artificial Intelligence in Sports Market Drivers Shaping Future Growth, Revenue USD 19.2 Billion by 2030

#artificialintelligence

Based on sports type, the football segment held the highest market share in 2020, holding more than one-fourth of the global artificial intelligence in sports market, and is projected to maintain its leadership status during the forecast period. This is attributed to adoption of IoT devices such as sensors, GPS trackers, and computer vision algorithms to track movement of players and balls. However, the basketball segment is expected to manifest the fastest CAGR of 35.0% from 2021 to 2030, owing to usefulness of AI in-game analysis for identifying trends and analyzing innumerable variations of plays such as pick-and-rolls.


Twitter's data center knocked out by extreme heat in California

Los Angeles Times

Extreme heat that exhausted California's overworked electric grid on Labor Day had knocked out one of Twitter's main data centers in Sacramento, according to a report. While Twitter avoided a shutdown on Sept. 5 by leaning on its other data centers in Portland, Ore., and Atlanta during the outage to keep its systems running, a company executive warned that if another center were lost, some users would have been unable to access the social media platform, according to an internal memo obtained by CNN. Temperatures in Sacramento on Labor Day broke a daily record of 114 degrees, punching thermometers up to 116 by the afternoon. To power their online services to users, tech companies such as Twitter, Google, or Meta lean on data centers that can demand heavy loads of power and often generate large amounts of heat, requiring cooling systems to keep things running. As climate change continues to heat the planet, Twitter's outage underscores how such extreme weather impacts the online systems that billions of people rely on daily.


Fulltime R openings in Portland on August 29, 2022

#artificialintelligence

Detailed JD: • Minimum of 15 years of technical experience in Oracle ERP (Oracle Cloud/PeopleSoft) • Experience with preparation of data strategy, migration plan, object dependencies, etc. • Experience in Oracle Financials Cloud Schema and Data model • Experience with Master (customer, supplier, COA, etc.) and transaction (GL, PO, AP, etc.) data in Oracle Financials Cloud • Experience with conducting impact assessment on outbound data payload from Oracle Financials Cloud to data lake • Experience in creating design document for accommodating changes to the payload • Experience with optimizing data transfer (Extraction, Cleansing,Transformation, Loading and Validation) from Oracle Financials Cloud to data lake • Hands on with writing complex SQL • Excellent oral and written communication skills • Good understanding of PeopleSoft financial data model • Experience in data lake architecture Apply Here For Remote Business Architect/ Portland, OR ( Remote),6-12 months contract roles, visit Remote Business Architect/ Portland, OR ( Remote),6-12 months contract Roles